ps multiquery concurrent by sagehen03 · Pull Request #63 · broadinstitute/dig-bioindex

sagehen03 · 2026-01-09T18:36:58Z

adding enrichr end point
[Multiquery] Allow multiple queries in a single call
Fix so that it doesn't connect to S3 multiple times
remove combine with method
[Multiquery] Initial narrow use concurrent multiquery
remove print

sagehen03

I have 3 comments made out of an abundance or caution.

sagehen03 · 2026-01-09T19:02:55Z

bioindex/lib/reader.py

-        # start reading the records on-demand
-        self.record_filter = record_filter
-        self.records = self._readall()
+        if self.bytes_total <= config.response_limit:


Suggested change

if self.bytes_total <= config.response_limit:

if self.bytes_total <= config.response_limit and len(self.sources) > 1:

I'd vote for only using the new _readparallel method when we have mutliple queries or multiple files to process otherwise we're spinning up a thread pool where we didn't previously.

sagehen03 · 2026-01-09T19:29:14Z

bioindex/lib/reader.py

+        A generator that reads each of the records from S3 for the sources.
+        """
+        record_map = {}
+        with concurrent.futures.ThreadPoolExecutor(max_workers=10) as pool:


We're creating a thread pool for each read parallel request here. I think it might be better to have a global pool that we use. It's a small concern, but we'd end up resource starved pretty quickly if we have a handful of requests coming in at once that all create a thread pool. If we had a global pool with say 8-10 threads, then requests could queue waiting for available threads.

sagehen03 · 2026-01-09T19:34:02Z

bioindex/lib/reader.py


-                            if self.record_filter is None or self.record_filter(record):
+                            if source.record_filter is None or source.record_filter(record):
                                self.count += 1


Now, that we're calling _readsource from multiple threads and using the same RecordSource object I think we have shared state between multiple threads. I think this probably only means some disagreement in the response metadata and the data, but I think we can probably either use synchronization or stack confined variables to fix.

[Multiquery] Allow multiple queries in a single call

36f65cb

psmadbec force-pushed the ps-multiquery-concurrent branch from d3a8574 to 36f65cb Compare January 9, 2026 18:46

sagehen03 commented Jan 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ps multiquery concurrent#63

ps multiquery concurrent#63
sagehen03 wants to merge 1 commit intomasterfrom
ps-multiquery-concurrent

sagehen03 commented Jan 9, 2026

Uh oh!

sagehen03 left a comment

Uh oh!

sagehen03 Jan 9, 2026

Uh oh!

sagehen03 Jan 9, 2026 •

edited

Loading

Uh oh!

sagehen03 Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	if self.bytes_total <= config.response_limit:
	if self.bytes_total <= config.response_limit and len(self.sources) > 1:

Conversation

sagehen03 commented Jan 9, 2026

Uh oh!

sagehen03 left a comment

Choose a reason for hiding this comment

Uh oh!

sagehen03 Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

sagehen03 Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sagehen03 Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sagehen03 Jan 9, 2026 •

edited

Loading